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Abstract. Network-based marketing refers to a collection of marketing 
techniques that take advantage of links between consumers to increase 
sales. We concentrate on the consumer networks formed using direct 
interactions (e.g., communications) between consumers. We survey the 
diverse literature on such marketing with an emphasis on the statis- 
tical methods used and the data to which these methods have been 
applied. We also provide a discussion of challenges and opportunities 
for this burgeoning research topic. Our survey highlights a gap in the 
literature. Because of inadequate data, prior studies have not been able 
to provide direct, statistical support for the hypothesis that network 
linkage can directly affect product /service adoption. Using a new data 
set that represents the adoption of a new telecommunications service, 
we show very strong support for the hypothesis. Specifically, we show 
three main results: (1) "Network neighbors" — those consumers linked 
to a prior customer — adopt the service at a rate 3-5 times greater than 
baseline groups selected by the best practices of the firm's marketing 
team. In addition, analyzing the network allows the firm to acquire new 
customers who otherwise would have fallen through the cracks, because 
they would not have been identified based on traditional attributes. (2) 
Statistical models, built with a very large amount of geographic, de- 
mographic and prior purchase data, are significantly and substantially 
improved by including network information. (3) More detailed network 
information allows the ranking of the network neighbors so as to permit 
the selection of small sets of individuals with very high probabilities of 
adoption. 
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1. INTRODUCTION 

Network-based marketing seeks to increase brand 
recognition and profit by taking advantage of a so- 
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cial network among consumers. Instances of network- 
based marketing have been called word- of -mouth mar- 
keting, diffusion of innovation, buzz marketing and 
viral marketing (we do not consider multilevel mar- 
keting, which has become known as "network" mar- 
keting). Awareness or adoption spreads from con- 
sumer to consumer. For example, friends or acquain- 
tances may tell each other about a product or ser- 
vice, increasing awareness and possibly exercising 
explicit advocacy. Firms may use their websites to 
facilitate consumer-to-consumer advocacy via prod- 
uct recommendations (Kautz, Selman and Shah, 1997) 
or via on-line customer feedback mechanisms (Del- 
larocas, 2003). Consumer networks may also provide 
leverage to the advertising or marketing strategy of 
the firm. For example, in this paper we show how 
analysis of a consumer network improves targeted 
marketing. 

This paper makes two contributions. First we sur- 
vey the burgeoning methodological research litera- 
ture on network-based marketing, in particular on 
statistical analyses for network-based marketing. We 
review the research questions posed, and the data 
and analytic techniques used. We also discuss chal- 
lenges and opportunities for research in this area. 
The review allows us to postulate necessary data re- 
quirements for studying the effectiveness of network- 
based marketing and to highlight the lack of current 
research that satisfies those requirements. Specifi- 
cally, research must have access both to direct links 
between consumers and to direct information on the 
consumers' product adoption. Because of inadequate 
data, prior studies have not been able to provide di- 
rect, statistical support (Van den Bulte and Lilien, 
2001) for the hypothesis that network linkage can 
directly affect product/service adoption. 

The second contribution is to provide empirical 
support that network-based marketing indeed can 
improve on traditional marketing techniques. We in- 
troduce telecommunications data that present a nat- 
ural testbed for network-based marketing models, 
in which communication linkages as well as product 
adoption rates can be observed. For these data, we 
show three main results: (1) "Network neighbors"— 
those consumers linked to a prior customer — adopt 
the service at a rate 3-5 times greater than baseline 
groups selected by the best practices of the firm's 
marketing team. In addition, analyzing the network 
allows the firm to acquire new customers who other- 
wise would have fallen through the cracks, because 



they would not have been identified based on tradi- 
tional attributes. (2) Statistical models, built with a 
very large amount of geographic, demographic and 
prior purchase data, are significantly and substan- 
tially improved by including network information. 
(3) More sophisticated network information allows 
the ranking of the network neighbors so as to permit 
the selection of small sets of individuals with very 
high probabilities of adoption. 

2. NETWORK-BASED MARKETING 

There are three, possibly complementary, modes 
of network-based marketing. 

Explicit advocacy: Individuals become vocal ad- 
vocates for the product or service, recommending 
it to their friends or acquaintances. Particular indi- 
viduals such as Oprah, with her monthly book club 
reading list, may represent "hubs" of advocacy in 
the consumer relationship network. The success of 
The Da Vinci Code, by Dan Brown, may be due 
to its initial marketing: 10,000 books were delivered 
free to readers thought to be influential enough (e.g., 
individuals, booksellers) to stimulate the traffic in 
paid- for editions (Paumgarten, 2003). When firms 
give explicit incentives to consumers to spread in- 
formation about a product via word of mouth, it 
has been called viral marketing, although that term 
could be used to describe any network-based mar- 
keting where the pattern of awareness or adoption 
spreads from consumer to consumer. 

Implicit advocacy: Even if individuals do not speak 
about a product, they may advocate implicitly through 
their actions — especially through their own adop- 
tion of the product. Designer labeling has a long 
tradition of using consumers as implicit advocates. 
Firms commonly capitalize on influential individu- 
als (such as athletes) to advocate products simply 
by conspicuous adoption. More recently, firms have 
tried to induce the same effect by convincing par- 
ticularly "cool" members of smaller social groups to 
adopt products (Gladwell, 1997; Hightower, Brady 
and Baker, 2002). 

Network targeting: The third mode of network- 
based marketing is for the firm to market to prior 
purchasers' social-network neighbors, possibly with- 
out any advocacy at all by customers. For network 
targeting, the firm must have some means to identify 
these social neighbors. 

These three modes may be used in combination. 
A well-cited example of viral marketing combines 
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network targeting and implicit advocacy: 
The Hotmail free e-mail service appended to the 
bottom of every outgoing e-mail message the hy- 
perlinked advertisement, "Get your free e-mail at 
Hotmail," thereby targeting the social neighbors of 
every current user (Montgomery, 2001), while tak- 
ing advantage of the user's implicit advocacy. Hot- 
mail saw an exponentially increasing customer base. 
Started in July 1996, in the first month alone Hot- 
mail acquired 20,000 customers. By September 1996 
the firm had acquired over 100,000 accounts, and by 
early 1997 it had over 1 million subscribers. 

Traditional marketing methods do not appeal to 
some segments of consumers. Some consumers ap- 
parently value the appearance of being on the cut- 
ting edge or "in the know," and therefore derive 
satisfaction from promoting new, exciting products. 
The firm BzzAgents (Walker, 2004) has managed to 
entice voluntary (unpaid) marketing of new prod- 
ucts. Furthermore, although more and more infor- 
mation has become available on products, parsing 
such information is costly to the consumer. Explicit 
advocacy, such as word-of-mouth advocacy, can be 
a useful way to filter out noise. 

A key assumption of network-based marketing 
through explicit advocacy is that consumers prop- 
agate "positive" information about products after 
they either have been made aware of the product by 
traditional marketing vehicles or have experienced 
the product themselves. Under this assumption, a 
particular subset of consumers may have greater value 
to firms because they have a higher propensity to 
propagate product information (Gladwell, 2002), 
based on a combination of their being particularly 
influential and their having more friends (Richard- 
son and Domingos, 2002). Firms should want to find 
these influencers and to promote useful behavior. 

3. LITERATURE REVIEW 

Many quantitative statistical methods used in em- 
pirical marketing research assume that consumers 
act independently. Typically, many explanatory at- 
tributes are collected on each actor and used in mul- 
tivariate modeling such as regression or tree induc- 
tion. In contrast, network-based marketing assumes 
interdependency among consumer preferences. When 
interdependencies exist, it may be beneficial to ac- 
count for their effects in targeting models. Tradi- 
tionally in statistical research, interdependencies are 
modeled as part of a covariance structure, either 



within a particular observational unit (as in the case 
of repeated measures experiments) or between ob- 
servational units. Studies of network-based market- 
ing instead attempt to measure these interdepen- 
dencies through implicit links, such as matching on 
geographic or demographic attributes, or through 
explicit links, such as direct observation of commu- 
nications between actors. In this section, we review 
the different types of data and the range of statisti- 
cal methods that have been used to analyze them, 
and we discuss the extent to which these methods 
naturally accommodate networked data. 

Work in network-based marketing spans the fields 
of statistics, economics, computer science, sociology, 
psychology and marketing. In this section, we or- 
ganize prominent work in network-based marketing 
by six types of statistical research: (1) econometric 
modeling, (2) network classification modeling, (3) 
surveys, (4) designed experiments with convenience 
samples, (5) diffusion theory and (6) collaborative 
filtering and recommender systems. In each case, we 
provide an overview of the approach and a discus- 
sion of a prominent example. This (brief) survey is 
not exhaustive. In the final subsection, we discuss 
some of the statistical challenges inherent in incor- 
porating this network structure. 

3.1 Econometric Models 

Econometrics is the application of statistical meth- 
ods to the empirical estimation of economic rela- 
tionships. In marketing this often means the esti- 
mation of two simultaneous equations: one for the 
marketing organization or firm and one for the mar- 
ket. Regression and time-series analysis are found 
at the core of econometric modeling, and economet- 
ric models are often used to assess the impact of a 
target marketing campaign over time. 

Econometric models have been used to study the 
impact of interdependent preferences on rice con- 
sumption (Case, 1991), automobile purchases (Yang 
and Allenby, 2003) and elections (Linden, Smith and 
York, 2003). For each of the aforementioned studies, 
geography is used in part as a proxy for interde- 
pendence between consumers, as opposed to direct, 
explicit communication. However, different methods 
are used in the analysis. Most recently, Yang and 
Allenby (2003) suggested that traditional random 
effects models are not sufficient to measure the in- 
terdependencies of consumer networks. They devel- 
oped a Bayesian hierarchical mixture model where 
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interdependence is built into the covariance struc- 
ture through an autoregressive process. This frame- 
work allows testing of the presence of interdepen- 
dence through a single parameter. It also can incor- 
porate the effects of multiple networks, each with its 
own estimated dependence structure. In their ap- 
plication, they use geography and demography to 
create a "network" of consumers in which links are 
created between consumers who exhibit geographic 
or demographic similarity. The authors showed that 
the geographically defined network of consumers is 
more useful than the demographic network for ex- 
plaining consumer behavior as it relates to purchas- 
ing Japanese cars. Although they do not have data 
on direct communication between consumers, the 
framework presented by Yang and Allenby (2003) 
could be extended to explicit network data where 
links are created between consumers through their 
explicit communication as opposed to demographic 
or geographic similarity. 

A drawback of this approach is that the interde- 
pendence matrix has size n 2 , where n is the num- 
ber of consumers; consumer networks are extremely 
large and prohibit parameter estimation using this 
method. Sparse matrix techniques or clever cluster- 
ing of the observations would be a natural extension. 

3.2 Network Classification Models 

Network classification models use knowledge of 
the links between entities in a network to estimate 
a quantity of interest for those entities. Typically, 
in such a model an entity is influenced most by 
those directly connected to it, but is also affected 
to a lesser extent by those further away. Some net- 
work classification models use an entire network to 
make predictions about a particular entity on the 
network; Macskassy and Provost (2004) provided a 
brief survey. However, most methods have been ap- 
plied to small data sets and have not been applied to 
consumer data. Much research in network classifica- 
tion has grown out of the pioneering work by Klein- 
berg (1999) on hubs and authorities on the Inter- 
net, and out of Google's PageRank algorithm (Brin 
and Page, 1998), which (to oversimplify) identifies 
the most influential members of a network by how 
many influential others "point" to them. Although 
neither study uses statistical models, both are re- 
lated to well-understood notions of degree central- 
ity and distance centrality from the field of social- 
network analysis. 



One paper that models a consumer network for 
maximizing profit is by Richardson and Domingos 
(2002), in which a social network of customers is 
modeled as a Markov random field. The probability 
that a given customer will buy a given product is a 
function of the states of her neighbors, attributes 
of the product and whether or not the customer 
was marketed to. In this framework it is possible 
to assign a "network value" to every customer by 
estimating the overall benefit of marketing to that 
customer, including the impact that the marketing 
action will have on the rest of the network (e.g., 
through word of mouth). The authors tested their 
model on a database of movie reviews from an In- 
ternet site and found that their proposed method- 
ology outperforms non-network methods for esti- 
mating customer value. Their network formulation 
uses implicit links (customers are linked when a cus- 
tomer reads a review by another customer and sub- 
sequently reviews the item herself ) and implicit pur- 
chase information (they assume a review of an item 
implies a purchase and vice versa). 

3.3 Surveys 

Most research in this area does not have infor- 
mation on whether consumers actually talk to each 
other. To address this shortcoming, some studies use 
survey sampling to collect comprehensive data on 
consumers' word-of-mouth behavior. By sampling 
individuals and contacting them, researchers can col- 
lect data that are difficult (or impossible) to ob- 
tain directly by observing network-based marketing 
phenomena (Bowman and Narayandas, 2001). The 
strength of these studies lies in the data, includ- 
ing the richness and flexibility of the answers that 
can be collected from the responders. For instance, 
researchers can acquire data about how customers 
found out about a product and how many others 
they told about the product. An advantage is that 
researchers can design their sampling scheme to con- 
trol for any known confounding factors and can de- 
vise fully balanced experimental designs that test 
their hypotheses. Since the purpose of models built 
from survey data is description, simple statistical 
methods like logistic regression or analysis of vari- 
ance (ANOVA) typically are used. 

Bowman and Narayandas (2001) surveyed more 
than 1700 purchasers of 60 different products who 
previously had contacted the manufacturer of that 
product. The purchasers were asked specific ques- 
tions about their interaction with the manufacturer 
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and its impact on subsequent word-of-mouth behav- 
ior. The authors were able to capture whether the 
customers told others of their experience and if so, 
how many people they told. The authors found that 
self-reported "loyal" customers were more likely to 
talk to others about the products when they were 
dissatisfied, but interestingly not more likely when 
they were satisfied. Although studies like this 
collect some direct data on consumers' word-of-mouth 
behavior, the researchers do not know which 
of the consumers' contacts later purchased the prod- 
uct. Therefore, they cannot address whether word- 
of-mouth actually affects individual sales. 

3.4 Designed Experiments with Convenience 
Samples 

Designed experiments enable researchers to study 
network-based marketing in a controlled setting. Al- 
though the subjects typically comprise a convenience 
sample (such as those undergraduates who answer 
an ad in the school newspaper) , the design of the ex- 
periment can be completely randomized. This is un- 
like the studies that rely on secondary data sources 
or data from the Web. Typically ANOVA is used to 
draw conclusions. 

Frenzen and Nakamoto (1993) studied the factors 
that influence individuals' decisions to disseminate 
information through a market via word-of-mouth. 
The subjects were presented with several scenar- 
ios that represented different products and market- 
ing strategies, and were asked whether they would 
tell trusted and nontrusted acquaintances about the 
product/sale. They studied the effect of the cost/value 
manipulations on the consumers' willingness to share 
information actively with others, as a function of the 
strength of the social tie. In this study, the authors 
did not allow the subjects to construct their explicit 
consumer network; instead, they asked the partici- 
pants to hypothesize about their networks. The ex- 
periments used the data from a convenience sam- 
ple to generalize over a complete consumer network. 
The authors also employed simulations in their study. 
They found that the stronger the moral hazard (the 
risk of problematic behavior) presented by the infor- 
mation, the stronger the ties must be to foster infor- 
mation propagation. Generally, the authors showed 
that network structure and information characteris- 
tics interact when individuals form their information 
transmission decisions. 



3.5 Diffusion Models 

Diffusion theory provides tools, both quantitative 
and qualitative, to assess the likely rate of diffu- 
sion of a technology or product. Qualitatively, re- 
searchers have identified numerous factors that facil- 
itate or hinder technology adoption (Fichman, 2004), 
as well as social factors that influence product adop- 
tion (Rogers, 2003). Quantitative diffusion research 
involves empirical testing of predictions from diffu- 
sion models, often informed by economic theory. 

The most notable and most influential diffusion 
model was proposed by Bass (1969). The Bass model 
of product diffusion predicts the number of users 
who will adopt an innovation at a given time t. It 
hypothesizes that the rate of adoption is a function 
solely of the current proportion of the population 
who have adopted. Specifically, let F(t) be the cu- 
mulative proportion of adopters in the population. 
The diffusion equation, in its simplest form, models 
F(t) as a function of p, the intrinsic adoption rate, 
and q, a measure of social contagion. When q > p, 
this equation describes an 5-shaped curve, where 
adoption is slow at first, takes off exponentially and 
tails off at the end. This model can effectively model 
word-of-mouth product diffusion at the aggregate, 
societal level. 

In general, the empirical studies that test and 
extend accepted theories of product diffusion rely 
on aggregate-level data for both the customer at- 
tributes and the overall adoption of the product 
(Ueda, 1990; Tout, Evans and Yakan, 2005); they 
typically do not incorporate individual adoption. Mod- 
els of product diffusion assume that network-based 
marketing is effective. Since understanding when dif- 
fusion occurs and the extent to which it is effective is 
important for marketers, these methods may bene- 
fit from using individual-level data. Data on explicit 
networks would enable the extension of existing dif- 
fusion models, as well as the comparison of results 
using individual- versus aggregate-level data. 

In his first study, Bass tested his model empir- 
ically against data for 11 consumer durables. The 
model yielded good predictions of the sales peak 
and the timing of the peak when applied to his- 
torical data. Bass used linear regression to estimate 
the parameters for future sales predictions, measur- 
ing the goodness of fit (R 2 value) of the model for 
11 consumer durable products. The success of the 
forecasts suggests that the model may be useful in 
providing long-range forecasting for product sales 
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or adoption. There has been considerable follow-up 
work on diffusion since this groundbreaking work. 
Mahajan, Muller and Kerin (1984) review this work. 
Recent work on product diffusion explores the ex- 
tent to which the Internet (Fildes, 2003) as well as 
globalization (Kumar and Krishnan, 2002) play a 
role in product diffusion. 

3.6 Collaborative Filtering and Recommender 
Systems 

Recommender systems make personalized recom- 
mendations to individual consumers based on de- 
mographic content and link data (Adomavicius and 
Tuzhilin, 2005). Collaborative filtering methods fo- 
cus on the links between consumers; however, the 
links are not direct. They associate consumers with 
each other based on shared purchases or similar rat- 
ings of shared products. 

Collaborative filtering is related to explicit con- 
sumer network-based marketing because both tar- 
get marketing tasks benefit from learning from data 
stored in multiple tables (Getoor, 2005). For ex- 
ample, Getoor and Sahami (1999), Huang, Chung 
and Chen (2004) and Newton and Greiner (2004) 
established the connection between the recommen- 
dation problem and statistical relational learning 
through the application of probabilistic relational 
models (PRM's) (Getoor, Friedman, Koller and Pf- 
effer, 2001). However, neither group used explicit 
links between customers for learning. Recommen- 
dation systems may well benefit from information 
about explicit consumer interaction as an additional, 
perhaps quite important, aspect of similarity. 

3.7 Research Opportunities and Statistical 
Challenges 

We see that there is a burgeoning body of work 
that addresses consumers' interactions and their ef- 
fects on purchasing. To our knowledge the forego- 
ing types represent the main statistical approaches 
taken in research on network-based marketing. In 
each approach, there are assumptions made in the 
data collection or in the analysis that restrict them 
from providing strong and direct support for the hy- 
pothesis that network-based marketing indeed can 
improve on traditional techniques. Surveys and con- 
venience samples can suffer from small and possi- 
bly biased samples. Collaborative filtering models 
have large samples, but do not measure direct links 
between individuals. Models in network classifica- 
tion and econometrics historically have used proxies 



like geography instead of data on direct communica- 
tions, and almost all studies have no accurate, spe- 
cific data on which (and what) customers purchase. 

To paint a complete picture of network influence 
for a particular product, the ideal data set would 
have the following properties: (1) large and unbiased 
sample, (2) comprehensive covariate information on 
subjects, (3) measurement of direct communication 
between subjects and (4) accurate information on 
subjects' purchases. The data set we present in the 
next section has all of these properties and we will 
demonstrate its value for statistical research into 
network influence. The question of how to analyze 
such data brings up many statistical issues: 

Data-set size. Network-based marketing data sets 
often arise from Internet or telecommunications ap- 
plications and can be quite large. When observations 
number in the millions (or hundreds of millions), 
the data become unwieldy for the typical data an- 
alyst and often cannot be handled in memory by 
standard statistical analysis software. Even if the 
data can be loaded, their size renders the interac- 
tive style of analysis common with tools like R or 
Splus painfully slow. In Internet or telecommunica- 
tions studies, there often are two massive sources of 
data: all actors (web sites, communicators), along 
with their descriptive attributes, and the transac- 
tions among these actors. One solution is to com- 
press the transaction information into attributes to 
be included in the actors' attribute set. It has been 
shown that file squashing (DuMouchel, Volinsky et 
al., 1999), which attempts to combine the best fea- 
tures of preprocessed data with random sampling, 
can be useful for customer attrition prediction. Du- 
Mouchel et al. claimed that squashing can be useful 
when dealing with up to billions of records. However, 
there may be a loss of important information which 
can be captured only by complex network structure. 

More sophisticated network information derived 
from transactional data can also be incorporated 
into the matrix of customer information by deriv- 
ing network attributes such as degree distribution 
and time spent on the network (which we demon- 
strate below). Similarly, other types of data such 
as geographical data or temporal data, which oth- 
erwise would need to be handled by some sophisti- 
cated methodology, can be folded into the analysis 
by creating new covariates. It remains an open ques- 
tion whether clever data engineering can extract all 
useful information to create a set of covariates for 
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traditional analysis. For example, knowledge of com- 
munication with specific sets of individuals can be 
incorporated, and may provide substantial benefit 
(Perlich and Provost, 2006). 

Once the data are combined, the remaining data 
set still may be quite large. While much data min- 
ing research is focused on scaling up the statistical 
toolbox to today's massive data sets, random sam- 
pling remains an effective way to reduce data to a 
manageable size while maintaining the relationships 
we are trying to discover, if we assume the network 
information is fully encoded in the derived variables. 
The amount of sampling necessary will depend on 
the computing environment and the complexity of 
the model, but most modern systems can handle 
data sets of tens or hundreds of thousands of ob- 
servations. When sampling, care must be taken to 
stratify by any attributes that are of particular in- 
terest or to oversample those attributes that have 
extremely skewed distributions. 

Low incidence of response. In applications where 
the response is a consumer's purchase or reaction 
to a marketing event, it is common to have a very 
low response rate, which can result in poor fit and 
reduced ability to detect significant effects for stan- 
dard techniques like logistic regression. If there are 
not many independent attributes, one solution is 
Poisson regression, which is well suited for rare events 
Poisson regression requires forming buckets of ob- 
servations based on the independent attributes and 
modeling the aggregate response in these buckets as 
a Poisson random variable. This requires discretiza- 
tion of any continuous independent attributes, which 
may not be desirable. Also, if there are even a mod- 
erate number of independent attributes, the buckets 
will be too sparse to allow Poisson modeling. Other 
solutions that have been proposed include oversam- 
pling positive responses and/or undersampling neg- 
ative responses. Weiss (2004) gave an overview of the 
literature on these and related techniques, showing 
that there is mixed evidence as to their effective- 
ness. Other studies of note include the following. 
Weiss and Provost (2003) showed that, given a fixed 
sample size, the optimal class proportion in train- 
ing data varies by domain and by ultimate objec- 
tive (but can be determined); generally speaking, to 
produce probability estimates or rankings, a 50:50 
distribution is a good default. However, Weiss and 
Provost's results are only for tree induction. Japkow- 
icz and Stephen (2002) experimented with neural 
networks and support-vector machines, in addition 



to tree induction, showing (among other things) that 
support-vector machines are insensitive to class im- 
balance. However, they considered primarily noise- 
free data. Other techniques to deal with unbalanced 
response attributes include ensemble (Chan and Stolfo, 
1998; Mease, Wyner and Buja, 2006) and multi- 
phase rule induction (Clearwater and Stern, 1991; 
Joshi, Kumar and Agarwal, 2001). This is an area 
in need of more systematic empirical and theoretical 
study. 

Separating word- of -mouth from homophily. Unless 
there is information about the content of communi- 
cations, one cannot conclude that there was word-of- 
mouth transmission of information about the product. 
Social theory tells us that people who communi- 
cate with each other are more likely to be similar to 
each other, a concept called homophily (Blau, 1977; 
McPherson, Smith-Lovin and Cook, 2001). Homophily 
is exhibited for a wide variety of relationships and 
dimensions of similarity. Therefore, linked consumers 
probably are like-minded, and like-minded consumers 
tend to buy the same products. One way to ad- 
dress this issue in the analysis is to account for 
consumer similarity using propensity scores (Rosen- 
baum and Rubin, 1984). Propensity scores were de- 
veloped in the context of nonrandomized clinical tri- 
als and attempt to adjust for the fact that the statis- 
tical profile of patients who received treatment may 
be different than the profile of those who did not, 
and that these differences could mask or enhance 
the apparent effect of the treatment. Let T repre- 
sent the treatment, X represent the independent 
attributes excluding the treatment and Y represent 
the response. Then the propensity score PS(x) = 
P(T = 1|X = x). By matching propensity scores in 
the treatment and control groups using typical indi- 
cators of homophily like demographic data, we can 
account (partially) for the possible confoundedness 
of other independent attributes. 

Incorporating extended network structure. Data with 
network structure lend themselves to a robust set of 
network-centric analyses. One simple method (em- 
ployed in our analysis) is to create attributes from 
the network data and plug them into a traditional 
analysis. Another approach is to let each actor be in- 
fluenced by her neighborhood modeled as a Markov 
random field. Domingos and Richardson (2001) used 
this technique to assign every node a "network value." 
A node with high network value (1) has a high prob- 
ability of purchase, (2) is likely to give the product 
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a high rating, (3) is influential on its neighbors' rat- 
ings and (4) has neighbors like itself. Hoff, Raftery 
and Handcock (2002) defined a Markov-chain Monte 
Carlo method to estimate latent positions of the ac- 
tors for small social-network data sets. This embeds 
the actors in an unobserved "social space," which 
could be more useful than the actual transactions 
themselves for predicting sales. The field of statis- 
tical relational learning (Getoor, 2005) has recently 
produced a wide variety of methods that could be 
applicable. Often these models allow influence to 
propagate through the network. 

Missing data. Missing data in network transac- 
tions are common — often only part of a network is 
observable. For instance, firms typically have trans- 
actional data on their customers only or may have 
one class of communication (e-mail) but not another 
(cellular phone). One attempt to account for these 
missing edges is to use network structure to assign 
a probability of a missing edge everywhere an edge 
is not present. Thresholding this probability creates 
pseudo-edges, which can be added to the network, 
perhaps with a lesser weight (Agarwal and Pregi- 
bon, 2004). This is closely related to the link predic- 
tion problem, which tries to predict where the next 
links will be (Liben-Nowell and Kleinberg, 2003). 
One extension of the PRM framework models link 



structure through the use of reference uncertainty 
and existence uncertainty. The extension includes a 
unified generative model for both content and re- 
lational structure, where interactions between the 
attributes and link structure are modeled (Getoor, 
Friedman, Koller and Taskar, 2003). 

4. DATA SET AND PRIMARY HYPOTHESIS 

This section details our data set, derived primarily 
from a direct-mail marketing campaign to potential 
customers of a new communications service (later 
we augment the primary data with a large set of 
consumer-specific attributes). The firm's marketing 
team identified and marketed to a list of prospects 
using its standard methods. We investigate whether 
network-related effects or evidence of "viral" infor- 
mation spread are present in this group. As we will 
describe, the firm also marketed to a group we iden- 
tified using the network data, which allows us to 
test our hypotheses further. We are not permitted 
to disclose certain details, including specifics about 
the service being offered and the exact size of the 
data set. 

4.1 Initial Data Details 

In late 2004, a telecommunications firm undertook 
a large direct-mail marketing campaign to potential 



Table 1 

Descriptive statistics for the marketing segments (see Section 4.1 for details) 



Segment 


Loyalty 


Intl 


Techl 


Tech2 


Early Adopt 


Offer 


% of list 


%NN 


1 


3 


Y 


Hi 


1-7 


Med-Hi 


PI 


1.6 


0.63 


2 


3 


Y 


Med 


1-7 


Med-Hi 


PI 


2,4 


1.26 


3 


2 


Y 


Hi 


1-4 


Hi 


PI 


1.7 


0.08 


4 


2 


Y 


Med 


1 4 


Hi 


PI 


1.7 


0.10 


5 


1 


Y 


Hi 


1-4 


Hi 


PI 


0.1 


0.22 


6 


1 


Y 


Med 


1 4 


Hi 


PI 


0.1 


0.25 


7 


3 


N 


Hi 


1-7 


Med-Hi 


P2 


10.9 


0.50 


8 


3 


N 


Med 


1-7 


Med-Hi 


P2 


13.1 


0.83 


9 


2 


N 


Hi 


1-4 


Hi 


P2 


17.5 


0.04 


10 


2 


N 


Med 


1 4 


Hi 


P2 


11.0 


0.07 


11 


1 


N 


Hi 


1-4 


Hi 


P2 


5.3 


0.14 


12 


1 


N 


Med 


1-4 


Hi 


P2 


7.7 


0.25 


13 


3 


N 


Hi 


1-7 


Med-Hi 


P2 


2.0 


0.63 


14 


1, 2 


N 


Hi 


1-4 


Hi 


P2 


2.0 


0.15 


15 


1 


Y 


? 


? 


7 


P3 


2.0 


1.01 


16 


1 


N 


? 


? 


? 


P2 


1.6 


0.46 


17 


3 


N 


Hi 


1-7 


Med-Hi 


P2+ 


2.0 


0.70 


18 


1, 2 


N 


Hi 


1-4 


Hi 


P2+ 


2.0 


0.15 


19 


1, 2, 3 


Y 


Hi 


1-7 


Med-Hi 


P3 


1.8 


0.67 


20 


2 


N 


Hi, Med 


1-4 


Hi 


LI 


6.0 


0.05 


21 


2 


N 


Hi, Med 


1-4 


Hi 


L2 


6.0 


0.05 
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customers of a new communications service. This 
service involved new technology and, because of this, 
it was believed that marketing would be most suc- 
cessful to those consumers who were thought to be 
"high tech." 

In keeping with standard practice, the marketing 
team collected attributes on a large set of prospects — 
consumers whom they believed to be potential 
adopters of the service. The marketing team used 
demographic data, customer relationship data, and 
various other data sources to create profitability and 
behavioral models to identify prospective targets — 
consumers who would receive a targeted mailing. 
The data the marketing team provided us with did 
not contain the underlying customer attributes (e.g., 
demographics), but instead included values for de- 
rived attributes that defined 21 marketing segments 
(Table 1) that were used for campaign management 
and post hoc analyses. The sample included millions 
of consumers. The team believed that the different 
segments would have varying response rates and it 
was important to separate the segments in this way 
to learn the most from the campaign. 

An important derived variable was loyalty, a three- 
level score based on previous relationships with the 
firm, including previous orders of this and other ser- 
vices. Roughly, loyalty level 3 comprises customers 
with moderate-to- long tenure and/or those who have 
subscribed to a number of services in the past. Loy- 
alty level 2 comprises those customers with which 
the firm has had some limited prior experiences. 
Loyalty level 1 comprises consumers who did not 
have service with the firm at the time of mailing; 
little (if any) information is available on them. Pre- 
vious analyses have shown that loyalty and tenure 
attributes have substantial impact on response to 
campaigns. 

Other important attributes were based on demo- 
graphics and other customer characteristics. The at- 
tribute Intl is an indicator of whether the prospect 
had previously ordered any international services; 
Techl (hi, med or low) and Tech2 (1-10, where 1 = 
high tech) are scores derived from demographics and 
other attributes that estimate the interest and abil- 
ity of the customer to use a high-tech service; Early 
Adopt is a proprietary score that estimates the like- 
lihood of the customer to use a new product, based 
on previous behavior. We also show the Offer, in- 
dicating that different segments received different 
marketing messages: P1-P3 indicate different post- 
cards that were sent, LI and L2 indicate different 



letters, and a "+" indicates that a "call blast" ac- 
companied the mailing. In defining the segments, 
those groups with high loyalty values were permitted 
lower values from the technology and early adoption 
models. Segments 15 and 16 were provided by an ex- 
ternal vendor; there were insufficient data on these 
prospects to fit our Tech and Early Adopt models, 
as indicated by a "?" in Table 1. 

4.2 Primary Hypothesis and Network Neighbors 

The research goal we consider here is whether re- 
laxing the assumption of independence between con- 
sumers can improve demonstrably the estimation 
of response likelihood. Thus, our first hypothesis is 
that someone who has direct communication with 
a current subscriber is more likely herself to adopt 
the service. It should be noted that the firm knows 
only of communications initiated by one of its cus- 
tomers through a service of the firm, so the net- 
work data are incomplete (considerably), especially 
for the lower loyalty groups. Data on communica- 
tions events include anonymous identifiers for the 
transactors, a time stamp and the transaction dura- 
tion. For the purposes of this research, all data are 
rendered anonymous so that individual identities are 
protected. 

In pursuit of our hypothesis, we constructed an 
attribute called network neighbor (or NN) — a flag 
that indicates whether the targeted consumer had 
communicated with a current user of the service in a 
time period prior to the marketing campaign. Over- 
all, 0.3% of the targets are network neighbors. In Ta- 
ble 1, the percentage of network neighbors (%NN) 
is broken down by segment. 

In addition, the marketing team invited us to cre- 
ate our own segment, which they also would target. 
Our "segment 22" consisted of network neighbors 
that were not already on the current list of targets. 
To make sure our list contained viable prospects, the 
marketing team calculated the derived technology 
and early adopter scores for the consumers on our 
list. They filtered based on these scores, but they 
relaxed the thresholds used to limit their original 
list. For instance, someone with loyalty = 1 needed 
a Tech2 score less than 4 to merit inclusion on the 
initial list; this threshold was relaxed for our list to 
Tech2 less than 7. In this way, the marketing team 
allowed prospects who missed inclusion on the first 
cut to make it into segment 22 if they were net- 
work neighbors. However, the marketing team still 
avoided targeting customers who they believed had 
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Table 2 
Data categories 







Target = Y 


Target = N 


NN 


= Y 


NN targets 


NN nontargets 






Segments 1-22 








Relative size =0.015 


Relative size = 0.10 






Prospects identified by marketing models and who 


Consumers who were network neighbors, but were 






also are network neighbors. Those in segment 22 


not marketed to because they scored poorly on 






have reduced thresholds on the marketing model 


marketing models. 






scores. 




NN 


= N 


Non-NN targets 


Non-NN nontargets 






Segments 1-21 








Relative size = 1 


Relative size > 8 






Prospects identified by marketing models but who 


Consumers who were not network neighbors and 






are not network neighbors. 


also were not considered to be good prospects by 








the marketing model. 



Notes. The data for our study are broken down into targets and network neighbors. The "relative size" value shows the 
number of prospects who show up in each group, relative to the non-NN target group. 



very small probabilities of a purchase. For those 
network neighbors who did not score high enough 
to warrant inclusion in segment 22, we still tracked 
their purchase records to see if any of them sub- 
scribed to the service in the absence of the market- 
ing campaign; see below. Overall, the profile of the 
candidates in our segment 22 was considered to be 
subpar in terms of demographics, affinity and tech- 
nological capability. Notably, for our final conclu- 
sions, these targets are potential customers the firm 
would have otherwise ignored. The size of segment 
22 was about 1.2% of the marketing list. 

To summarize, the above process divides the pros- 
pect universe along two dimensions: (1) targets — 
those consumers identified by the marketing mod- 
els as being worthy of solicitation — and (2) network 
neighbors — those who had direct communication with 
a subscriber. Table 2 shows the relative size for each 
combination (using the non-network-neighbor tar- 
gets as the reference set). Note the non-NN nontar- 
gets, who neither are network neighbors nor are they 
deemed to be good prospects. This group is the ma- 
jority of the prospect space and includes consumers 
that the firm has very little information about, be- 
cause they are low-usage communicators or do not 
subscribe to any services with the firm. 

4.3 Modeling with Consumer-Specific Data 

To determine whether relaxing the independence 
assumption (using the network data) improves mod- 
eling, we fit models using a wide range of demo- 
graphic and consumer-specific independent attributes 



(many of which are known or believed to affect the 
estimated likelihood of purchase). Overall, we col- 
lected the values for over 150 attributes to assess 
their effect on sales likelihood and their interactions 
with the network-neighbor variable. These values in- 
cluded the following: 

• Loyalty data: We obtained finer-grained loyalty 
information than the simple categorization described 
above, including past spending, types of service, 
how often the customer responded to prior mail- 
ings, a loyalty score generated by a proprietary 
model and information about length of tenure. 

• Geographic data: Geographic data were necessary 
for the direct mail campaign. These data include 
city, state, zip code, area code and metropolitan 
city code. 

• Demographic data: These include information such 
as gender, education level, credit score, head of 
household, number of children in the household, 
age of members in the household, occupation and 
home ownership. Some of this information was 
inferred at the census tract level from the geo- 
graphic data. 

• Network attributes: As mentioned earlier, we ob- 
served communications of current subscribers with 
other consumers. In addition to the simple network- 
neighbor flag described earlier, we derived more 
sophisticated attributes from prospects' commu- 
nication patterns. We will return to these in Sec- 
tion 5.6. 
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4.4 Data Limitations 

We encountered missing values for customers across 
all loyalty levels. The amount of missing informa- 
tion is directly related to the level of experience we 
have had with the customer just prior to the direct 
mailing. For example, geography data are available 
for all targets across all three loyalty levels. On the 
other hand, as the number of services and tenure 
with the firm decline, so does the amount of infor- 
mation (e.g., transactions) available for each target. 
Given the difference in information as loyalty varies, 
we grouped customers by loyalty level and treated 
the levels separately in our analyses. This stratifica- 
tion leaves three groups that are mostly internally 
consistent with respect to missing values. 

The overall response rate is very low. As discussed 
above, this presents challenges inherent with a heav- 
ily skewed response variable. For example, an analy- 
sis that stratifies over many different attributes may 
have several strata with no sales at all, rendering 
these strata mostly useless. The data set is large, 
which helps to ameliorate this problem, but in turn 
presents logistical problems with many sophisticated 
statistical analyses. In this paper, we restrict our- 
selves to relatively straightforward analyses. 

4.5 Loyalty Distribution 

A look at the distribution of the loyalty groups 
across the four categories (Figure 1) of prospects 
shows that the firm targeted customers in the higher 
loyalty groups relatively heavily. The network-neighbc 
target group appears to skew toward the less loyal 
prospects; this is due to the fact that segment 22, 
which makes up a large part of the network-neighbor 
population, comprises predominantly low-loyalty con- 
sumers. 

5. ANALYSIS 

Next we will show direct, statistical evidence that 
consumers who have communicated with prior cus- 
tomers are more likely to become customers. We 
show this in several ways, including using our own 
best efforts to build competing targeting models and 
conducting thorough assessments of predictive abil- 
ity on out-of-sample data. Then we consider more 
sophisticated network attributes and show that tar- 
geting can be improved further. 



5.1 Network-Based Marketing Improves 
Response 

Segmentation provides an ideal setting to test the 
significance and magnitude of any improvement in 
modeling by including network-neighbor informa- 
tion, while stratifying by many attributes known to 
be important, such as loyalty and tenure. The re- 
sponse variable is the take rate for the targets in the 
two months following the direct mailing. The take 
rate is the proportion of the targeted consumers who 
adopted the service within a specified period follow- 
ing the offer. For each segment, we performed a sim- 
ple logistic regression for the independent network- 
neighbor attribute versus the dependent sales re- 
sponse. In Figure 2, we graphically present parame- 
ter estimates (equivalent to log-odds ratios) for the 
network attribute along with 95% confidence inter- 
vals for 20 of the 21 segments (segment 5 had only 
a small number of network-neighbor prospects and 
zero network-neighbor sales, and therefore had an 
infinite log odds). Figure 2 shows that in all 20 seg- 
ments the network-neighbor effect is positive (the 
parameter estimate is greater than zero), demon- 
strating an increased take rate for the network-neighbor 
group within each segment. For 17 of these seg- 
ments, the log-odds ratio is significantly different 
from the null hypothesis value of (p < 0.05), in- 
dicating that being a network neighbor significantly 
affected sales in those segments. 

While odds ratios allow for tests of significance of 
an independent variable, they are not as directly 
interpretable as comparisons of take rates of the 
network-neighbor and non-network-neighbor groups 
in a given segment. The take rates for the network 




Segments (ordered by log odds) 

Fig. 2. Results of logistic regression. Parameter estimates 
plotted as log-odds ratios with 95% confidence intervals. The 
number plotted at the value of the parameter estimate refers 
back to segment numbers from Table 1. 
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Target =Y 


Target = N 


NN = Y 






1 1 








1 1 i 




NN-N 


h 

1 2 1 











Fig. 1. Loyalty distribution by customer category. The three bars show the relative sizes of the three loyalty groups for our 
four data categories. The network neighbors (NN) show a much larger proportion of low-loyalty consumers than the non-NN 
group. 



neighbors are plotted versus the non-network neigh- 
bors in Figure 3, where the size of the point is pro- 
portional to the log size of the segment. All segments 
have higher take rates in the network-neighbor sub- 
group, except for the one segment that had no network- 
neighbor sales (the smallest sample size). Over the 
entire data set, the network-neighbors' take rates 
were greater by a factor of 3.4. This value is plotted 
in Figure 3 as a dotted line with slope = 3.4. The 
right-hand plot of Figure 3 shows the relationship 
between each segment's take rate and its lift ratio, 
defined as the take rate for NN divided by the take 
rate for non-NN. The plot shows that the benefit of 
being a network neighbor is greater for those seg- 
ments with lower overall take rates. 

As Figure 3 shows, some of the segments had much 
higher take rates than others. To assess statistical 
significance of the network-neighbor effect after ac- 
counting for this segment effect, we ran a logistic 
regression across all segments, including the main ef- 
fects for the network-neighbor attribute, dummy at- 
tributes for each segment and the interaction terms 
between the two. Two of the interaction terms had 
to be deleted: one from segment 22, which only had 
network-neighbor cases, and one from the segment 
with no sales from the network neighbors. We ran 
a full logistic regression and used stepwise variable 
selection. 

The results of the logistic regression reiterate the 
significance of being a network neighbor. The final 
model can be found in Table 3. The coefficient of 
2.0 for the network-neighbor attribute in the final 



model is an estimate of the log odds, which we ex- 
ponentiate to get an odds ratio of 7.49, with a 95% 
confidence interval of (5.64, 9.94). More than half 
of the segment effects and most of the interactions 
between the network-neighbor attribute and those 
segment effects are significant. The interpretation of 
these interactions is important. Note that the mag- 
nitudes of the interaction coefficients are negative 
and very close in magnitude to the coefficients of the 



Table 3 

Coefficients and confidence intervals for the final segment 
model 



Attribute 




Coeff (c.i.) 


Significance" 


Network neighbor 


(NN) 


2.0 


(1.7, 2.3) 


* * 


Segment = 1 




1.7 


(0.9, 2.5) 


** 


Segment = 2 




1.8 


(1.2, 2.4) 


** 


Segment = 4 




2.1 


(1.3, 3.0) 


** 


Segment = 5 




1.9 


(0.4, 3.3) 


** 


Segment = 6 




1.9 


(1.2, 2.5) 


** 


Segment = 7 




1.4 


(1.0, 1.9) 


** 


Segment = 8 




1.3 


(0.9, 1.7) 


** 


Segment = 17 




1.5 


(0.7, 2.2) 


** 


Segment = 19 




2.2 


(1.6, 2.9) 


** 


NN x Segment = 


1 


-1.1 


(-2.1, 0.0) 


* 


NN x Segment = 


2 


-0.9 


(-1.7, -0.2) 


** 


NN x Segment = 


4 


-1.8 


(-4.0, 0.4) 


** 


NN x Segment = 


6 


-1.5 


(-2.6, -0.6) 


** 


NN x Segment = 


7 


-1.2 


(-1.7, -0.6) 


** 


NN x Segment = 


8 


-0.8 


(-1.3, -0.4) 


** 


NN x Segment = 


17 


-1.6 


(-2.8, -0.5) 


** 


NN x Segment = 


19 


-1.1 


(-1.9, -0.3) 


** 



"Significance of the attributes in the logistic regression model 
is shown at the 0.05 (*) and 0.01 (**) levels. 
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1 2 3 4 5 -I 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 



Take Rate (%) for Non- Network Take Rate for N on- Network 

Fig. 3. Take rates for marketing segments. Left: For each segment, comparison of the take rate of the non-network neighbors 
with that of the network neighbors. The size of the glyph is proportional to the log size of the segment. There is one outlier 
not plotted, with a take rate of 11% for the network neighbors and 0.3% for the non-network neighbors. Reference lines are 
plotted at x — y and at the overall take-rate ratio of 3.4. Right: Plot of the take rate for the non-network group versus lift 
ratio for the network neighbors. 



main effects of the segments themselves. Therefore, 
although the segments themselves are significant, in 
the presence of the network attribute the segments' 
effect is mostly negated by the interaction effect. 
Since the segments represent known important at- 
tributes like loyalty, tenure and demographics, this 
is evidence that being a network neighbor is at least 
as important in this context. 

In Table 4 we present an analysis of deviance ta- 
ble, an analog to analysis of variance used for nested 
logistic regressions (McCullagh and Nelder, 1983). 
The table confirms the significance of the main ef- 
fects and of the interactions. Each level of the nested 
model is significant when a chi-squared approxima- 
tion is used for the differences of the deviances. The 
fact that so many interactions are significant demon- 
strates that the network-neighbor effect varies for 
different segments of the prospect population. 

5.2 Segment 22 

The segment data enable us to compare take rates 
of network and non-network targets for the segments 
that contained both types of targets. However, many 
of the network-neighbor targets fall into the network- 
only segment 22. Segment 22 comprises prospects 
that the original marketing models deemed not to be 
good candidates for targeting. As we can see from 
the distribution in Figure 1, this segment for the 
most part contains consumers who had no prior re- 
lationship with the firm. 

We compare the take rates for segment 22 with 
the take rates for the combined group, including all 



of segments 1—21, in the leftmost three bars of Fig- 
ure 4. The network-neighbor segment 22 is (not sur- 
prisingly) not as successful as the NN groups in seg- 
ments 1-21, since the targets in segments 1-21 were 
selected based on characteristics that made them fa- 
vorable for marketing. Interestingly, we see that the 
segment 22 network neighbors outperform the non- 
NN targets from segments 1-21. These segment 22 
network neighbors, identified primarily on the ba- 
sis of their network activity, were more likely by al- 
most 3 to 1 to purchase than the more "favorable" 
prospects who were not network neighbors. Since 
those in segment 22 either were not identified by 
marketing analysts or were deemed to be unworthy 
prospects, they represent customers who would have 




Network 
Neighbors 
Segs 1-21 



Network 
Neighbors 
Seg 22 



Non-Network ' Network 
Neighbors Neighbors 
Segs 1-21 Non-Targets 



Fig. 4. Take rates for marketing segments. Take rates for 
the network neighbors and non-network neighbors in segments 
1-21 compared with the all-network-neighbor segment 22 and 
with the nontarget network neighbors. All take rates are rela- 
tive to the non-network-neighbor group (segments 1-21). 
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Table 4 

Analysis of deviance table for the network-neighbor study 



Variable Deviance DF Change in deviance Significance" 



Intercept 11200 

Segment 10869 9 63 

Segment + NN 10733 1 370 

Segment + NN + interactions 10687 8 41 



"Significance of the group of attributes at each step is shown at the 0.05 (*) and 0.01 
(**) levels. 



"fallen through the cracks" in the traditional mar- 
keting process. 

5.3 Improving a Multivariate Targeting Model 

Now we will assess whether the NN attribute can 
improve a multivariate targeting model by incorpo- 
rating all that we know or can find out (over 150 
different attributes) about the targets, including ge- 
ography, demographics and other company-specific 
attributes, from internal and external sources (see 
Section 3.2). 

As discussed in Section 3.7, we tried to address 
(as well as possible) an important causal question 
that arises: Is this network-neighbor effect due to 
word of mouth or simply due to homophily? The 
observed effect may not be indicating viral propa- 
gation, but instead may simply demonstrate a very 
effective way to find like-minded people. This theo- 
retical distinction may not matter much to the firm 
for this particular type of marketing process, but 
is important to make, for example, before design- 
ing future campaigns that try to take advantage of 
word-of-mouth behavior. 

Although we cannot control for unobserved simi- 
larities, we can be as careful as possible in our anal- 
ysis to ensure that the statistical profile of the NN 
prospects is the same as the profile for the non-NN 
cases. Since our data set contains many more non- 
NN cases than NN cases, we match each NN case 
with a single non-NN case that is as close as possi- 
ble to it by calculating propensity scores using all of 
the explanatory attributes considered (as described 
in Section 3.7). At the end of this matching process, 
the NN group is as close as is reasonably possible in 
statistical properties to the non-NN group. 

Due to heterogeneity of data sources across the 
three loyalty groups, we used the propensity scores 
to create a matched data set for each group. For each 



(individually), we fitted a full logistic regression in- 
cluding interactions and selected a final model us- 
ing stepwise variable selection. All attributes were 
checked for outliers, transformations and collinear- 
ity with other attributes, and we removed or com- 
bined the attributes that accounted for any signifi- 
cant correlations. 

Table 5 shows the results of the logistic regres- 
sions, which show the attributes that were found 
to be significant, those that were negatively cor- 
related with take rate, and those that had inter- 
actions with the NN attribute. Each of the three 
models found the network-neighbor attribute to be 
significant along with several others. The signifi- 
cant attributes tended to be attributes regarding 
the prospects' previous relationships with the firm, 
such as previous international services, tenure with 
firm, churn identifiers and revenue spent with the 
firm. These attributes are typically correlated with 
demographic attributes, which explains the lack of 
significance of many of the demographic attributes 
considered. Interestingly, tenure with firm is signif- 
icant in loyalty groups 1 and 2, but with different 
signs. In the most loyal group, tenure is negatively 
correlated, but in the mid- level loyalty group it is 
positive. This unexpected result may be due to dif- 
fering compositions of the two groups; those con- 
sumers with long tenure in the most loyal group 
might be people who just never change services, 
while long tenure in the other group might be an 
indicator that they are gaining more trust in the 
company. In loyalty group 1, there is limited infor- 
mation about previous services with the firm. For 
those customers, knowing whether the customer has 
responded to any previous marketing campaigns has 
a significant effect. 

Table 5 also shows parameter estimates for NN 
and the take rates in the three loyalty groups. The 
take rates are highest in the group with the most 
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Table 5 
Results of multivariate model 



Loyalty 



1 



Significant 
attributes 



Beta hat for NN 

(95% CI) 
Take rate 



NN 

Discount calling plan (-)(I) 
Level of Int'l Comm.(I) 
# of devices in house (-) 
Revenue band 

Tenure with firm (-) 
International communicator 
Belonged to loyalty program 
Referral plan 
Type of previous service 
Credit score 

Number of adults in house 

0.68 (0.46, 0.91) 
0.9% 



NN 

Discount calling plan (-) 
Tenure with firm 
Referral plan 
High Tech model score (I) 

Region of country indicator 

Belonged to loyalty program 

Chumer (-) 

College grad 

Tenure at residence (-) 

Any children in house (-) 

Child < 18 at home (-) 

0.99 (0.49, 1.49) 
0.4% 



NN 

Previous responder to 
mailing 

High Tech Msg 
Letter (vs. postcard) 
Recent responder to mailing 
User of incentive credit card 
Any children in house (-) 



0.84 (0.52, 1.16) 
0.3% 



Notes. Significant attributes from logistic regressions across loyalty levels (p < 0.05). Bold indicates significance at 0.01 level; 
(-) indicates the effect of the variable was negative; (I) indicates a significant interaction with the NN variable. 



loyalty but, interestingly, this group gets the least 
lift (smallest parameter estimate) from the NN at- 
tribute. So the impact of network- neighbor is stronger 
for those market segments with lower loyalty, where 
actual take rates are weakest. 

5.4 Consumers Not Targeted 

As discussed above, only a select subset of our 
network-neighbor list was subject to marketing, based 
on relaxed thresholds on eligibility criteria. The re- 
mainder of the list, the nontarget network neighbors, 
made up the majority. Potential customers were omit- 
ted for various reasons: they were not believed to 
have high-tech capacity; they were on a do-not-contact 
list; address information was unreliable, and so on. 
Nonetheless, we were able to identify whether they 
purchased the product in the follow-up time period. 
The take rate for this group was 0.11%, and is shown 
relative to the target groups as the rightmost bar in 
Figure 4. Although they were not even marketed to, 
their take rate is almost half that for the non-NN 
targets — chosen as some of the best prospects by the 
marketing team. This group comprises consumers 
without any known favorable characteristics that 
would have put them on the list of prospects. The 
fact that they are network neighbors alone supports 
a relatively high take rate, even in the absence of 
direct marketing. This lends some support to an ex- 
planation of word-of-mouth propagation rather than 
homophily. 



Finally, we will briefly discuss the remainder of 
the consumer space — the non-NN nontarget group. 
Unfortunately, it is very difficult to estimate a take 
rate in this category, which could be considered a 
baseline rate for all of the other take rates. To do 
this, we would need to estimate the size of the space 
of all prospects. This includes all of the prospects the 
firm knows about, as well as customers of the firm's 
competitors and consumers who might purchase this 
product that do not have current telecommunica- 
tions service with any provider. It has been estab- 
lished that the size of the communications market 
is difficult to estimate (Poole, 2004); our best esti- 
mates of this baseline take rate put it at well below 
0.01%, at least an order of magnitude less than even 
the nontarget network neighbors. 

On the other hand, a by-product of our study 
is that we can upper-bound the effect of the mass 
marketing campaigns in general by comparing the 
target-NN group and the nontarget-NN group. The 
difference in take rates between the targeted net- 
work neighbors and the nontargeted network neigh- 
bors is about 10 to 1. This difference cannot all be 
attributed to the marketing effect, since the targeted 
group was specifically chosen to be better prospects 
and it is likely that more of them would have signed 
up for the service even in the total absence of mar- 
keting. However, it does seem reasonable to call this 
factor of 10 an upper bound on the effect of the 
marketing. 
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5.5 Out-of-Sample Ranking Performance 

These results suggest that we can give fine-grained 
estimations as to which customers are more or less 
likely to respond to an offer. Such estimations can be 
quite valuable: the consumer pool is immense and a 
campaign will have a limited budget. Therefore, be- 
ing able to pick a better list of "top-fc" prospects 
will lead directly to increased profit (assuming tar- 
geting costs are not much higher for higher ranked 
prospects). In this section, we show that combining 
the network-neighbor attribute with the traditional 
attributes improves the ability to rank customers 
accurately. 

For each consumer, we create a record that com- 
prises all of the traditional attributes (trad atts), 
including loyalty, demographic and geographic at- 
tributes, as well as network-neighbor status. Note 
that in different business scenarios, different types 
and amounts of data are available. For example, 
for low-loyalty customers, very few descriptive at- 
tributes are known. We report results here using 
all attributes; the findings are qualitatively simi- 
lar for every different subset of attributes we have 
tried (namely, segment, loyalty, geography, demo- 
graphic). The response variable is the same as above 
and we used the same logistic regression models. We 
measure the predictive ranking ability in the binary 
response variable by an increase in the Wilcoxon- 
Mann- Whitney statistic, equivalent to the area un- 
der the ROC curve (AUC). The ROC curve repre- 
sents the trade off between false negative and false 
positive rates for each predicted possible probabil- 
ity score cutoff resulting from the logistic regres- 
sion model. Specifically, the AUC is the probability 
that a randomly chosen (as yet unseen) taker will 
be ranked higher than a randomly chosen nontaker; 
AUC = 1.0 means the classes are perfectly separated 
and AUC = 0.5 means the list is randomly shuffled. 
All reported AUC values are averages obtained us- 
ing 10-fold cross-validation. 

Table 6 shows the AUC values for the three loy- 
alty groups, quantifying the expected benefit from 
the improved logistic regression models. There is an 
increase in AUC for each group, with the largest in- 
crease belonging to loyalty level 1 , for which the least 
information is available; note that here the ranking 
ability without the network information is not much 
better than random. 

To visualize this improvement, Figure 5(a) shows 
cumulative response ("lift") curves when using the 



model on loyalty group 3. The lower curve depicts 
the performance of the model using all traditional 
attributes, and the upper curve includes the tradi- 
tional marketing attributes and the network-neighbor 
attribute. In Figure 5(b), one can see the marked 
improvement that would be obtained from sending 
to the top-A: prospects on the list. For example, for 
the top 20% of the list, without the NN attribute, 
the take rate is 1.51%; with the NN attribute, it 
is 1.72%. The NN attribute does not improve the 
ranking for the top 10% of the list. 

5.6 Improving Performance By Adding More 
Sophisticated Network Attributes 

Knowing whether a consumer is a network neigh- 
bor is one of the simplest indicators of consumer- 
to-consumer interaction that can be extracted from 
the network data. We now investigate whether aug- 
menting the model with more sophisticated social- 
network information can add additional value. In 
this section, we focus on the social network that 
comprises (only) the current customers of this ser- 
vice (which here we will call "the network"), along 
with the periphery of prospects who have communi- 
cated with those on the network (the network neigh- 
bors). We investigate whether we can improve tar- 
geting by using more sophisticated measures of so- 
cial relationship with the network of existing cus- 
tomers. 

Table 7 summarizes a set of additional social-network 
attributes that we add to the logistic regression. The 
terminology we use is borrowed to some degree from 
the fields of social-network analysis and graph the- 
ory. Social-network analysis (SNA) involves measur- 
ing relationships (including information transmis- 
sion) between people on a network. The nodes in 
the network represent people and the links between 



Table 6 

ROC analysis: AUC values that result from the application 
of logistic regression models 



Loyalty 


trad atts 


trad atts + NN 


1 


0.54 


0.60 


2 


0.64 


0.67 


:i 


0.60 


0.64 



Note. The logistic regression models were built using all 
available attributes with (trad atts + NN) and without (trad 
atts) the network-neighbor attribute. We see an increase in 
AUC across all loyalty groups when the NN attribute is in- 
cluded in the model. 
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(a) (h) 

Fig. 5. (a) Lift curves. Power of the segmentation curves for models built with all attributes with (trad atts) and without 
(trad atts + NN) network-neighbor attribute. The model with the NN attribute outperforms the model without it. For example, 
if the firm sent out 50% of the mailing, they would get 70% of the positive responses with the NN compared to receiving only 
63% of the responses without it. (b) Top-k analysis. Consumers are ranked by the probability scores from the logistic regression 
model. The model that includes the NN attribute outperforms the model without. For example, for the top 20% of targets, the 
take rate is 1.51% without the NN attribute and 1.72% with the NN attribute. 



them represent relationships between the nodes. The 
SNA measures help quantify intuitive social notions, 
such as connectedness, influence, centrality, social 
importance and so on. Graph theory helps to un- 
derstand problems better by representing them as 
interconnected nodes, and provides vocabulary and 
methods for operating mathematically. 

Three of the attributes that we introduce can be 
derived from a prospect's local neighborhood (the 
set of immediate communication partners on the 
network; recall that these all are current customers). 
Degree measures the number of direct connections a 
node has. Within the local neighborhood, we also 
count the number of Transactions, and the length 
of those transactions (Seconds of communication). 

The network is made up of many disjoint sub- 
graphs. Given a graph G = (V, E), where V is a set 
of vertices (nodes) and £ is a set of links between 
them, the connected components of G are the sets of 
vertices such that all vertices in each set are mutu- 
ally connected (reachable by some path) and no two 



vertices in different sets are connected. The size of 
the connected component may be an indicator for 
awareness of and positive views about the product. 
If a prospect is linked to a large set of "friends" all 
of whom have adopted the service, she may be more 
likely to adopt herself. Connected component size is 
the size of the largest connected component (in the 
network) to which the prospect is connected. 

We also move beyond a prospect's local neigh- 
borhood. Observing the local neighborhoods of a 
prospect's local neighbors, we can define a measure 
of social similarity. We define social similarity as 
the size of the overlap in the immediate network 
neighborhoods of two consumers. Max similarity is 
the maximum social similarity between the prospect 
and any neighbors of the prospect. Finally, the firm 
also can observe the prior dynamics of its customers. 
In particular, the firm can observe which customers 
communicated before and/or after their adoption as 
well as the date customers signed up. Using this in- 
formation, we define influencers as those subscribers 



Table 7 
Network attribute descriptions 



Attribute 



Description 



Degree 
Transactions 

Seconds of communication 
Connected to influencer 
Connected component size 
Max similarity 



Number of unique customers communicated with before the mailing 
Number of transactions to/from customers before the mailing 
Number of seconds communicated with customers before mailing 
Is an influencer in prospect's local neighborhood? 
Size of the connected component prospect belongs to 

Max overlap in local neighborhood with any existing neighboring customer 



18 



S. HILL, F. PROVOST AND C. VOLINSKY 



Table 8 
ROC analysis 



Attribute(s) AUC 



Transactions 0.68 

Seconds of communication 0.68 

Degree 0.59 

Connected to influencer 0.53 

Connected component size 0.55 

Similarity 0.55 

All network 0.71 

All traditional (loyalty, demographic, geographic) 0.66 

All traditional + all network 0.71 



Note. AUC values result from logistic regression models built 
on each of the constructed network attributes individually, as 
well as in combination. Results are presented for loyalty-level 
3 customers. 

who signed up for the service and, subsequently, we 
see one of their network neighbors sign up for the 
service. Connected to influencer is an indicator of 
whether the prospect is connected to one of these 
influencers. We appreciate that we do not actually 
know if there was true influence. 

We use all of the aforementioned attributes and 
show AUC values for these predictive models in Ta- 
ble 8. We find that some of these network attributes 
have considerable predictive power individually and 
have even more value when combined. This is indi- 
cated by AUCs of 0.68 for both transactions and sec- 
onds of communication. We do not find high AUCs 
individually for connected component size, similar- 
ity or connected to influencer. Ultimately, we find 
that the logistic regression model built with the net- 
work attributes results in an AUC of 0.71 compared 
to an AUC of 0.66 without the network attributes — 
using only the traditional marketing attributes de- 
scribed in previous sections. (Recall that this repre- 
sents the ability to rank the network neighbors, who 
already have especially high take rates as a group, 
as we have shown.) 

Interestingly, when we combine the traditional at- 
tributes with the network attributes, there is no ad- 
ditional gain in AUC, even though many of these 
attributes were shown to be significant in the broader 
analysis above. The similarities represented implic- 
itly or explicitly in the network attributes seem to 
account for all useful information captured by tradi- 
tional demographics and other marketing attributes. 
That traditional demographics and other marketing 
attributes do not add value is not only of theoret- 
ical interest, but practical as well — for example, in 



cases such as this where demographic data must be 
purchased. 

Our result is further confirmed by the lift and 
take rate curves displayed in Figure 6(a) and (b), re- 
spectively. One can achieve substantially higher take 
rates using the new network attributes as compared 
to using the traditional attributes. For example, we 
find that for the top 20% of the targeted list, with- 
out the network attributes, the take rate is 2.2%; 
with the network attributes, it is 3.1%. Likewise, at 
the top 10% of the list, the take rate with the net- 
work attributes is 4.4% compared to 2.9% without 
them. 

6. LIMITATIONS 

We believe our study to be the first to combine 
data on direct customer communication with data 
on product adoption to show the effect of network- 
based marketing statistically. However, there are lim- 
itations in our study that are important to point 
out. 

There are several types of missing, incomplete or 
unreliable data which could influence our results. We 
have records of all of the communication (using the 
firm's service) to and from current customers of the 
service. That is not true for all the network-neighbor 
consumers. As such, we do not have complete infor- 
mation about the network-neighbor targets (as well 
as the non-network- neighbor targets). In addition, 
some of the attributes we used were collected by 
purchasing data from external sources. These data 
are known to be at least partially erroneous and 
outdated, although it is not well known how much 
so. An additional problem is joining data on cus- 
tomers from external sources to internal communi- 
cation data, leading to missing data or sometimes 
just blatantly incorrect data. Finally, telecommuni- 
cations firms are not legally able to collect informa- 
tion regarding the actual content of the communi- 
cation, so we are not able to determine if the con- 
sumers in question discussed the product. In this 
regard, our data are inferior to some other domains 
where content is visible, such as Internet bulletin 
boards or product discussion forums. 

We expect the network-neighbor effect to manifest 
itself differently for different types of products. Most 
of the studies done to date on viral marketing have 
focused on the types of products that people are 
likely to talk about, such as a new, high-tech gad- 
get or a recently released movie. We expect there 
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(a) (b) 

Fig. 6. (a) Lift curves. Power of segmentation curves for models built with all traditional attributes, with (trad atts + net) 
and without (trad atts) the network attributes. If the firm sent out 50% of the mailing, they would have received 77% of the 
positive responses with the network attributes compared to receiving 63% of the responses without the network attributes, (b) 
Top-k analysis. The model including the network attributes (trad atts + net) outperforms the model without them (trad atts). 
For example, for the top 20% of target ranked by score, the take rate is 2.2% without the network attributes and 3.1% with 
the network attributes. 



to be less buzz for less "sexy" products, like a new 
deodorant or a sale on grapes at the supermarket. 
The study presented in this paper involves a new 
telecommunications service, which involves a new 
technology and features that consumers have per- 
haps never been exposed to before. The firm hopes 
the new technology and features are such that they 
would encourage word of mouth. 

What can we say about other products that might 
not be quite so buzz-worthy? To study this, we com- 
pared the new service studied here to a roll-out of 
another product by the same firm. This other prod- 
uct was simply a new pricing plan for an older telecom- 
munications service. Customers who signed up for 
this new plan could stand to save a significant amount 
of money, depending on their current usage patterns. 
However, the range and variety of telecommunica- 
tions pricing plans in the marketplace is so exten- 
sive and so confusing to the typical consumer that 
we do not believe that this is the type of product 
that would generate a lot of discussion between con- 
sumers. We refer to the two products as the pricing 
plan and the new technology. For the pricing plan, 
we have the same knowledge of the network as we do 
for the new technology. For those consumers who be- 
long to the pricing plan, we know who they commu- 
nicate with and then we can follow these network- 
neighbor candidates to see if they ultimately sign up 
for the plan. We construct a measure of "network 
neighborness" as follows. For a series of consecutive 
months, we gather data for all customers who or- 
dered the product in that month. We calculate the 



percentage of these new customers who were net- 
work neighbors, that is, those who had previously 
communicated with a user of the product. This per- 
centage is a measurement of the proportion of new 
sales being driven by network effects. By comparing 
this percentage across two products, we get insight 
into which product stimulates network effects more. 

We now look at this value for our two products 
over an 8-month period. The time period for the 
two products was chosen so that it would be within 
the first year after the product was broadly avail- 
able. The results are shown in Figure 7. The two 
main points to take away are that the new service 
has a higher percent of purchasers who are network 
neighbors and also an increasing one (except for the 
dip in month 5). In contrast the pricing plan has a 
flat network-neighbor percentage, never increasing 
above 3%. 

Interestingly, the dip in the plot for the new ser- 
vice corresponds exactly to the month of the direct 
marketing discussed earlier. Before the campaign, 
we can see that the network-neighbor effect was in- 
creasing, that more and more of the purchasers in 
a given month were network neighbors. During the 
mass marketing campaign, we exposed many non- 
network neighbors to the service and many of them 
ended up purchasing it, temporarily dropping the 
network-neighbor percentage. After the campaign, 
we see the network-neighbor percentage starting to 
increase again. 

This network-neighborness measure should not be 
confused with the success of the product, as the pric- 
ing plan was quite successful from a sales perspec- 
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Fig. 7. Network-neighborness plot for new service versus 
pricing plan. 

tive, but it does suggest that the pricing plan is a 
product that has less of a network-based spread of 
information. This difference might be due to the new 
service creating more word-of-mouth or perhaps we 
are seeing the effects of homophily. People who inter- 
act with each other are more likely to be similar in 
their propensity for purchasing the new service than 
in their propensity for purchasing a particular pric- 
ing plan. Again, the effects of word of mouth versus 
homophily are difficult to discern without knowing 
the content of the communication. 

7. DISCUSSION 

One of the main concerns for any firm is when, 
how and to whom they should market their prod- 
ucts. Firms make marketing decisions based on how 
much they know about their customers and poten- 
tial customers. They may choose to mass market 
when they do not know much. With more informa- 
tion, they may market directly based on some ob- 
served characteristics. We provide strong evidence 
that whether and how well a consumer is linked to 
existing customers is a powerful characteristic on 
which to base direct marketing decisions. Our re- 
sults indicate that a firm can benefit from the use 
of social networks to predict the likelihood of pur- 
chasing. Taking the network data into account im- 
proves significantly and substantially on both the 
firm's own marketing "best practices" and our best 
efforts to collect and model with traditional data. 

The sort of directed network-based marketing that 
we study here has applicability beyond traditional 
telecommunciations companies. For example, eBay 
recently purchased Internet-telephony upstart Skype 



for $2.6 billion; they now also will have large-scale, 
explicit data on who talks to whom. With gmail, 
Google's e-mail service, Google now has access to 
explicit networks of consumer interrelationships and 
already is using gmail for marketing; directed network- 
based marketing might be a next step. Various systems 
have emerged recently that provide explicit link- 
ages between acquaintances (e.g., MySpace, Friend- 
ster, Facebook), which could be fruitful fields for 
network-based marketing. As more consumers cre- 
ate interlinked blogs, another data source arises. More 
generally, these results suggest that such linkage data 
potentially could be a sort of data considered for ac- 
quisition by many types of firms, as purchase data 
now are being collected routinely by many types of 
retail firms through loyalty cards. Even academic 
departments could benefit from such data; for ex- 
ample, the enrollment in specialized classes could 
be bolstered by "marketing" to those linked to ex- 
isting students. Such links exist (e.g., via e-mail). 
It remains to design tactics for using them that are 
acceptable to all. 

It is tempting to argue that we have shown that 
customers discuss the product and that discussion 
helps to improve take rates. However, word of mouth 
is not the only possible explanation for our result. As 
discussed in detail above, it may be that the network 
is a powerful source of information on consumer 
homophily, which is in accord with social theories 
(Blau, 1977; McPherson, Smith-Lovin and Cook, 2001). 
We have tried to control for homophily by using a 
propensity-matched sample to produce our logistic 
regression model. However, it may well be that di- 
rect communications between people is a better in- 
dicator of deep similarity than any demographic or 
geographic attributes. Either cause, homophily or 
word of mouth, is interesting both theoretically and 
practically. 
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